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ABSTRACT 

This paper provides a survey of models for the 
analysis of individual growth data emphasizing the problems posed by 
serial or time dependence in the application of polynomial regression 
models. The concepts of serial correlation and autoregressive models 
are illustrated. It is demonstrated that standard inference 
procedures may be quite misleading when applied to polynomial 
regression models involving time dependence. Little consideration has 
been given in the literature for the case of individual series to the 
development of alternative procedures or to the problem of providing 
a more reliable basis for inference except for the econometric model. 
(Author) 
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Introductory Statement 



The central mission of the Stanford Center for Research and Develop- 
ment in Teaching is to contribute to the improvement of teaching in 
American schools. Given the urgency of the times, technological develop- 
ments, and advances in knowledge from the behavioral sciences about teach- 
ing and learning, the Center works on the assumption that a fundamental 
reformulation of the future role of the teacher will take place. The 
Center's mission is to specify as clearly, and on as empirical a basis as 
possible, the direction of that reformulation, to help shape it, to fashion 
and validate programs for training and retraining teachers in accordance 
with it, and to develop and, test materials and procedures for use in these 
new training programs. 

The Center is at work in three interrelated problem areas: 

(a) Heuristic Teaching , which aims at promoting self-motivated and sus- 
tained inquiry in students, emphasizes affective as well as cognitive 
processes, and places a high premium upon the uniqueness of each pupil, 
teacher, and learning situation; (b) The Environment for Teaching , which 
aims at making schools more flexible so Chat pupils, teachers, and learn- 
ing materials can be brought together in ways that take account of their 
many differences; and (c) Teaching Students from Low-Income Areas , which 
aims to determine whether more heuristically oriented teachers and more 
open kinds of schools can and should be developed to improve the education 
of those currently labled as the poor and the disadvantaged. 

v 

Research and Development Memorandum No. 74, which follows, presents 
a methodological development generated by the Methodology Unit in answer 
to problems encountered in the analysis of repeated measurements data. 

Such data analysis problems pose frequent difficulties in data gathered 
by Center projects. 
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Abstract 




This paper provides a survey of models for the analysis of individual 
growth data emphasizing the problems posed by serial or time dependence in 
the application of polynomial regression models. The concepts of serial 
correlation and autoregressive models are illustrated. It is demonstrated 
that standard inference procedures may be quite misleading when applied to 
polynomial regression models involving time dependence. Little considera- 
tion has been given in the literature for the case of individual series to 
the development of alternative procedures or to the problem of providing 
a more reliable basis for inference except for the econometric model. 



THE EFFECTS OF SERIAL DEPENDENCE ON POLYNOMIAL 



REGRESSION MODELS FOR INDIVIDUAL GROWTH DATA 
John P. Burke and Janet Dixon Elashoff^ 

This paper surveys statistical models for the analysis of individual 

* 'N 

growth data with the major emphasis on the problems posed by serial or 
time dependence in the application of polynomial regression models. Time 
is considered to be an important variable, in contrast to situations in 
which repeated measurements are a device for reducing error variance or 
a convenience in data collection. 

The problems considered in the literature can be distinguished on 
the number of individuals considered, n , and the number of measurements 
per individual, p . For p = 2 and n sufficiently large (say, 10 
or more), the problem is one of measuring or contrasting group "growth" 
or "change." An extensive literature in educational and psychological 
research has been devoted to the analysis of such two-observation repeated 
measurements data (Cronbach & Furby, 1970; Lord, 1956, 1957, 1958, 1963; 
McNemar, 1958; Werts & Linn, 1970, and many others). With a larger num- 
ber of time points, different methods of characterizing change in group 
data arise. Under certain assumptions about the structure of the data, 
analysis of variance techniques may be applied to the analysis of regres- 
sion models with the time measure (generally in orthogonal polynomial 
form) as dependent variable. Winer (1962, Ch. 7) discusses the simplest 
form of analysis; Gaito and Wiley (1963) and Bock (1963) discuss more 

■\john P. Burke was a Research Assistant at the Center when this 
paper was prepared; Janet Dixon Elashoff is Coordinator of the Methodology 
Unit and Assistant Professor of Education at Stanford University. 



2 



general approaches and provide an introduction to the biometric literature 
in which most attention has been given to this situation. Rao (1965) and 
Grizzle and Allen (1969) have made more recent contributions. 

For a single individual or system and p very large (over 100 ), 
an extensive literature on the spectral analysis of time series has evolved, 
primarily in the context of electrical engineering applications (see, e.g. , 
Parzen, 1961). Holtzman (1963) discusses stochastic difference models for 
psychological data. 

The focus here is on applications to data on a single individual in 
which p is moderate, say, in the range 5 to 15 . Problems in the 
investigation of such data include postulating a model to account for the 
data, estimating parameters of the model to characterize the individual, 
and testing hypotheses about an individual’s curve. Questions of interest 
may be: What is an individual’s average score? Is there growth or learning 

(a trend over time)? Is that trend linear, quadratic, exponential? 

An example of the determination of characteristics of individual 
learning curves is provided by a study by Stake (1961) in which a theoret- 
ical learning function of hyperbolic form was fitted to data obtained over 
a series of trials, and the parameters of the function used as measures 
for an individual in subsequent analyses. Stake's procedures, however, 
take no account of the issues of dependence to be considered in this paper. 
(In fact, the psychological literature on learning curves generally assumes 
a nonstatistical error-free model. See, e.g., Estes, 1956.) Other models 
for individual change over time which do account for probable dependence 
are entering the literature of education and psychology as more attention 
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is given to the idea of experimental time series (Gottman, McFall, & 
Barnett, 1969 ; McGuire & Glass, 1967 ). 

The following sections (a) review a standard approach to the 
analysis of individual growth data, based on a polynomial regression 
model, which ignores possible dependence among the observations; (b) 
introduce some possible models to account for serial dependency among the 
observations; (c) discuss the effects of serial dependence on the standard 
procedures; (d) outline some methods for detecting the existence of serial 
dependence; and (e) discuss some alternative approaches to the problem. 

Standard Approach Ignoring Dependence; Polynomial Regression 

Suppose that observations, y , of some variable are taken on an 
individual at p points in time. A natural and simple model to describe 
the relationship between y and time, t , is a polynomial regression 
model: 

(1) y t - 3 0 + &,t + B 2 t 2 + ... + B k t k_1 + e k < p . 

With this polynomial regression model and some assumptions about 
e , one can describe an individual's growth curve, and test hypotheses 
about the initial score for an individual, or about the existence of trends 
in scores over time. A polynomial regression may be intended as an exact 
description of the process generating the data, or as an adequate approxi- 
mation to a more complex model for the relationship between y and the 
time measure. Or, one might apply regression techniques, without concern 
for any underlying process, because the coefficients, the $'s , provide 
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useful descriptions of patterns in the data (see Fig. 1). Here attention 
will be restricted to polynomial regression models although many other 
types of models are possible (see Anderson, in press). 

The standard least squares procedure for estimation and tests of 
hypotheses of the parameters 3 q , ... , 3^ will be valid if 

e fc * N(0 , 0 2 e ) 

( 2 ) 

E(e t e tf ) =0 t' * t . 

That is, for any t , the error term e^ has a normal distribution 

2 

with mean zero and variance 0 g . In addition, e t is independent 
of the error term at any other time t', t* ^ £ . In other words, 
standard least squares procedures will be valid if the relationship 
between y and y t , , t' < t , is due solely to the relationship of 
the means of y^ and y t> to t and t' and not to any dependence 
between the value of the observation y and the actual value observed 
for y t , . 

It is convenient although not essential to assume that the time points 
are equally spaced, and it is so assumed here. Minor differences in pro- 
cedure arise for unequally spaced time points in the determination of 
orthogonal polynomial coefficients (see Appendix). Common practice 
varies in the specification of the time measure; the p points being 
denoted 0, 1, 2, ..., p-1 or 1, 2, 3, ..., p . The same form of 
analysis applies to either convention, although the interpretation of 
regression coefficients may differ. 
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The p equations represented by the general equation (1) may be put 
in matrix form as 



(3) 



Pj 



k-1 



3 0 + i + e 2 1 + . . . + 3^1 
6 0 + e 1 2 + 2 2 + ... + b ^" 1 



6 0 + 6 1 t + B 2 t 2 + ... + B k t k 1 



B 0 + B lP + B 2 p 2 + . . . + B k p k 1 



or 



y = X 3 + e 



where y is the pxl vector of observations, y 



X = 



3 = 

a. 
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Assumptions (2) are that e has a multivariate normal distribution with 

% 

E(e) = 0 , 

% 'V 

and variance-covariance matrix 





E(e 2 ) E ( e i e 2 ^ E ( e l e p ) 




r 2 ~7 

a 0 ... 0 

e 


z = 

'V 


ECe^) E(e 2 ) 

• 

• • 


II 


2 

0 a ... 0 

e 




• • • ? 

E (e e. ) ... ... E(e ) 

pi p 




2 

0 ... a 

e 



From least squares theory the estimators of the coefficients are 
obtained by solving the equation 

(4) 3 = (X’X)" 1 X’ y . 

A A 

Under (2) the estimators 3 Q , 3j c _ 1 have a multivariate normal 

distribution with means 

E^) = 3 i i - 0, k-1 

(5) and variance-covariance matrix 
V(3) = a 2 (X’X) -1 . 

% e % % i 

The following hypothetical example illustrates the application of 
regression techniques to growth-type data. Suppose there are measures at 
each of 10 equidistant time points on two individuals (Table 1 and 
Fig. 1). One can fit regression curves and consider how the parameters 
of these curves reflect apparent differences in the pattern of growth. 

In this case it seems appropriate to fit a quadratic curve to both 
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sets of data. The quadratic model may be expressed with either powers 
of t or the orthogonal polynomial forms of these powers as the independent 
variables (see Appendix). The orthogonal polynomial model for a quadratic 
is written as 

y t * ^0 + Y 1 Pit + y 2 "2t + e t 

where p it are the orthogonal polynomial coefficients of degree i . 

Applying standard least squares procedures for orthogonal polynomials 
(Appendix formula A. 4), the data in Table 1 yields the estimates shown in 
Table 2. 

TABLE 1 

Hypothetical Data for 10 Time Points on Two Individuals 



Individual \ 


0 


1 


2 


3 


Time 

4 


5 


6 


7 


8 


9 


1 


20 


18 


25 


22 


28 


36 


50 


55 


70 


73 


2 


20 


33 


48 


52 


66 


63 


76 


78 


83 


81 



TABLE 2 

Least Squares Estimates of Orthogonal Polynomial 
Coefficients for Data in Table 1 
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0: Individual 1 X: Individual 2 

Fig. 1. Plots of hypothetical data for 10 time points on two 
individuals from Table 1 with fitted curves showing mean, linear, and 
quadratic components. 
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Examination of the coefficients shows that the mean y Q (shown by 
on Fig. 1) is considerably lower for individual 1. The coefficient 

A 

Yj_ which reflects the linear component of the trend (the straight lines 
in Fig. 1 show mean + linear component) is nearly the same for individ- 
uals 1 and 2. The coefficients y 2 clearly reflect the different char- 
acteristics of the quadratic components of the two individuals. Note that 

A 

Y 2 for individual 1 is positive indicating a concave curve, which might 
be interpreted as "early learning," while Y 2 for individual 2 is nega- 
tive indicating a convex curve or "late learning." The curved lines in 
Figure 1 are the fitted quadratic curves; the difference between the cor- 
responding straight and curved lines being the quadratic component con- 
tributed by Y 2 • 



Models for Serial Dependence 

Suppose a polynomial regression model such as that given by 
conditions (1) and (2) while being of the right general form does not 
fit the data well; that is, the value of y seems to have some depen- 
dence on the actual value of y , not explained by the dependence of 

the mean of y on t . A number of different models to describe this 
sort of situation have been proposed. 

Models based on stochastic difference equations express the 
dependence of observations on preceding observations rather than on the 
time measure. A simple model of this type is the first-order model; 

( 6 ) y t = P x y t-1 + e t » 



O 
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where the error component e fc is assumed to have the following 
properties : 



E(eJ = 0 



a 2 t = t' 
E(e e ,) = { e 
0 



E(e t y t _ s ) =0 1 <_ s < t-1 . 



That is, the have identical distributions with zero mean, constant 

2 

variance (d^) , and are independent of the preceding observations. 

This model is referred to as a first-order stochastic difference 
equation, Markov chain, or autoregressive process. In this first-order 
model an observation is assumed to be dependent on the immediately pre- 
ceding observation y^_^ j but not directly dependent on any observations 
preceding y • Other models may be proposed reflecting higher order 

dependence such as a stochastic difference equation of the second order 

' P 1 y t-l + P 2 h-2 + e t • 

The graphs of Figure 2 illustrate the kinds of series generated 
by stochastic difference and regression models. The first-order 
stochastic difference model generates a series in which there is oscil- 
lation about a mean value (0) with no overall trend, and a tendency for 
runs of positive and negative values. The linear regression model gives 
rise to a trend and is more appropriate to processes in which an increase 
in level with time is expected. 

A more interesting approach to growth data of this nature may be to 
retain the basic polynomial regression model described in (1) but to 
specify a model for serial dependence among the errors or residuals. 



0 
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25 - 
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15 - 
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5 - 
0 — 



• • 



_i l i 1 i 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

t 

Linear Regression Model: y = 0.4t + e fc , e fc ^ N(0,4) 



4 -• 
3 -■ 
2 •• 
1 ■■ 
r t 0- 
-1 

-2 ■■ 
-3 

-4 - 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

t 

Stochastic Difference Model: y = 0.4y t _^ + e t * e t % 



Fig. 2. Data generated by linear regression and stochastic difference 
models (note different scales). 
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Descriptions of several of the commonly used models of this type will be 
helpful; each requires different assumptions about the nature of serial 
dependence among the errors. 

A basic assumption of the standard least squares estimates for 

model (1) is that the errors have constant variance and are independent 

o 

of each other; i.e., the variance-covariance matrix is E = a I , Serial 

e 

correlation or dependence is said to exist whenever there is correlation 
between any pair of error terms; i.e., > 



Usually a particular model based on empirical evidence or theoretical 

considerations is chosen to represent the form of serial correlation. 

Here the characteristics of two particular models will be considered, the 

autoregressive error model , and the cumulative error model . 

The most commonly used model in the literature is the first-order 

2 

stochastic difference equation or autoregressive error model : 



* E(e t e t ,) ^ 0 for some t and t 1 ^ t . 




Hence E(e ) = 0 and E(e^) = a ^ 
t t e 



(9) 



E(u t ) = 0 , E(e t? u t ) = 0 



t 1 < t , E(u u ,) = { 

C C 0 t * t 

2 

a 

u 




In other words, the error term e is a linear function of the error 

terra at time t-1 , plus term u^ representing additional unaccounted- 
for variability. 



2 

This model for the error term is exactly like the stochastic difference 
model (6) for observations previously considered. 
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s 2 

It can be shown that E(e e , ) = p a . Hence the variance- 

t t+s e 

covariance matrix of errors generated by this process is 



Z = a 

a e 



1 P P 

pip 



r 1 

,P-2 



P P_1 ...pi 

^ mrn 

Such a pattern of variance-covariance matrix, in which off diagoral corre- 
lations decrease mono tonic ally, is often referred to as a simplex . 

In the context of this autoregressive model for serial dependence, 
a population serial correlation coefficient of lag s is defined as the 
correlation between error terms s units apart. Algebraically, 



(10) 



p s = 



E(e t e 



t+s) 



A< 2 '- 2 



' E < e t )E(< W 

Under model (8) , p = 



E <V t+ s> 



= p 



a 



The sample serial correlation coefficient of lag 1 of the residuals 
from regression provides an estimator of p . If one defines the residuals 



z = y^ - y , where is the standard least squares estimate at time 

t , then one definition of the sample coefficient of lag s is 




20 
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(ID 



r = 
s 



t = s+1 



z 

t t-s 



Z z 



t = 1 



which may be likened to a product moment correlation between the series 

z ,, , ... z and z, , . . . z . This coefficient provides a statistic 
s+1 p 1’ p-s 

useful for detecting dependence in observed data. 

The cumulative error model is less commonly encountered, but 
provides an interesting comparison with the previous model. Here it is 
proposed that 



( 12 ) 



= e 



t-1 



+ u. 



P 

or e - Z u , 

t=l 

and conditions (9) hold for e t and u t . It follows that E(e t ) = 0 , 

2 

and E( e t e t+s^ = t O u (s > 0) , and hence the variance-covariance 
matrix of errors is 



E 

c 




u 



1 1 1 ... 1 



1 



1 



2 



2 ... 2 



2 



1 2 

• » 

• t 

• • 

1 2 

1 2 



3 ... 3 3 

• II 

» • • 

3 . . .p-1 p-1 

3 . . .p-1 p 
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In the cumulative error model, tht> error at time t is composed of 
the error at time t-1 plus an independent component u^_ . The incre- 
ment u ^ may be conceived of as an error arising during the time interval 
t-1 to t . The conditions under which one might expect this model to 
hold are discussed by Mandel (1957). 

Note that the cumulative error model is the same form as the auto- 
regressive model but with p = 1 . The class of first-order autoregressive 
models with |p| 1 is often denoted as the class of nonstationary auto- 

regressive processes, while |p| < 1 defines the class of stationary auto- 
regressive processes. Stationary processes have constant variance across 

time; note the ones in the diagonals of £ . The variances of nonstationary 

3 . 

processes increase or "explode" over time; note the diagonal terms in E c . 
Observe that in the consideration of data for a single individual, an 
estimate of £ or £ would not be available to provide a suggestion of 

3 C 

which underlying error model was appropriate. In order to get an estimate 
of £ , replications of the sequence of p observations would be needed. 



The cumulative error model generates different betweeu-times 
correlations as well as different variances. Thus 



^ t , t+s 



t,t+s 

Vt+s 



t a 



u 



A 



t a /(t+s )a 




For any value of s this correlation is a function of t . Hence 
there is no single population serial correlation coefficient of lag s , 
P g , as in the autoregressive model. 
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The point here is that, while the autoregressive model is the 
one most frequently encountered in the literature and is often considered 
to be the model for serial correlation, other models may be worth con- 
sidering. The cumulative error model is one example. Other possibilities 

are second-order autoregressive models, or a model such as that used by 
3 

Box . These models may generate rather different patterns of serial 
dependence from those generated by the stationary autoregressive model. 
Furthermore, such concepts as the serial correlation coefficient may not 
be generally meaningful in other models. 



Effects of Serial Correlation on Conventional Least Squares Procedures 



This section will consider the effects of serial correlation on 
ordinary least squares procedures for estimation of parameters in (1) in terms 
of: (a) bias in the least squares estimators of regression coefficients; 

(b) the efficiency of the least squares estimators; (c) the validity of 
hypothesis tests and confidence intervals based on the conventional pro- 
cedure. 

The problem of prediction of values beyond the range of the time 
measure is considered by Johnston (1963, pp. 195-199), in the context of 
the general regression model encountered in econometric applications. 



3 

Box (1954), in studying the effects of serial correlation on analysis of 
variance, used 



1 



E = 




0 



P o 

1 p 

0 0 



0 

0 




o 
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Generally speaking, the results presented in this section may be 

summarized by saying that the ordinary least squares estimators of the 

regression coefficients, 3 , are unbiased and reasonably efficient 

(that is, the variances are not appreciably larger than the minimum 

obtainable variances) but tests and confidence intervals for the true 

values of 3 may be seriously misleading. 
o» 

Figure 3 illustrates the effect of serial correlation, showing data 
generated by a linear regression model with autoregressive error term 
for p = 0.2 and 0.8 . 3 q and are the ordinary least squares 

A 

estimates, p is the r^ of (11), and d is the Durbin-Watson 
d-statistic (22) whose use will be discussed later. Discrepancies be- 
tween the line specified by the model and the line fitted using ordinary 
least squares techniques are evident, particularly for p = 0.8 . Note 
that the residuals from the fitted line tend to be more random than the 

A 

actual errors. Consequently, p is an underestimate of p . 

Bias 

It is readily shown that the estimators obtained by ordinary least 
squares techniques are unbiased under any form of serial correlation 
(Johnston, p. 188) as long as E(e fc ) = 0 . This means that the expected 
value of a coefficient estimated from replications of a given series is 
the population coefficient. 

Efficiency 

The most efficient estimator of a parameter (out of a class of 
possible estimators) is that for which the variance of the estimate is 
least. Although two estimators may both be unbiased, the more efficient 
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Fig. 3. Generated data and fitted regression lines for model: 

' t * 6 o + B i t + e t * e t * p vi + u t 
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will tend to provide individual estimates closer to the population value, 
narrower confidence intervals, and consequently more powerful tests of 
hypotheses. When the conventional assumptions of independence (2) are 
met, the ordinary least squares (o.l.s.) estimators of the regression 
coefficients are the most efficient of any unbiased estimators which 
are linear functions of the observed values. 

When serial correlation exists the least squares estimators may 
not be the most efficient. If the error variance-covariance matrix E 
is assumed known , it can be shown that the most efficient linear unbiased 
estimators of the 3 coefficients are those obtained by techniques 
known as generalized least squares (Johnston, sect. 7.3). These 
estimators, the g.l.s. estimators, are given by 

(13) 3* = (X' E -1 X) _1 X' Z" 1 Y . 

'X/ *X/ *X/ X/ *\t *\f 



In practice, E would not be known. However, calculation of 
Var (3*) 

— for various specifications of p and Z provides a lower 

Var (3 i ) 

bound for the efficiency of the o.l.s. estimators relative to any other 
linear unbiased estimators one might devise. 

For general error variance-covariance matrix Z , the variances for 
g.l.s. and o.l.s. estimators are given by: 

(14) Var (3*) = (X' E _1 X) _1 

% 'X Oj % 

Var (§) = (X'X) -1 X' Z X (X'X) _1 . 



O 
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a 2 

(The formula for Var (B) reduces to formula (5) when E«0 I , the 

% e 

case of independence.) 

Table 3 gives the relative efficiencies of the o.l.s. and g.l.s. 

estimators of the polynomial coefficients B n , 3, , 8., based on five 

U 1 £ 

time points, for the autoregressive model (E = E ) with p varying 
from -0.8 to 0.8 , and for the cumulative error model. Note that 
relative efficiency increases with increase in number of time points. 

(See Rosenblatt, 1956, for tables of variances for o.l.s. and g.l.s. or 
Markov estimates for 10, 15, 20, and 50 time points in a linear regres- 
sion model.) 

It is clear that the relative efficiency of the o.l.s. estimators is 
generally high except for large negative p . Decisions between alter- 
native estimators based on relative efficiency depend on the complexity 
of calculation for the alternative estimators, and problem-specific 
decisions about power, etc.; however, except for p < - 0 . 6 , the o.l.s 
estimators have satisfactorily small variances under the autoregressive 
model. 

Validity of Hypothesis Tests and Confidence Intervals 

There are a variety of hypotheses about the values of regression 
coefficients that may be tested; e.g., H^: 8 ^ = 0 or H^: 8 q = 8 ^ = 82 = 0 
(8 = 0). Under the ordinary assumptions of independence (2) , t- and 
F-tests are appropriate for these hypotheses. There is evidence, however, 
that the existence of serial correlation seriously affects the validity of 
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the tests. 
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TABLE 3 

' Relative Efficiency of O.L.S. and G.L.S. Estimators of 

Polynomial Coefficients Based on 5 Time Points 



Autoregressive Error Model Cumulative 



P Error 
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-0.6 


1 

o 
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-0.2 
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0.4 
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0.8 


Model 


Var 


(3*) 
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0.43 


0.75 


0.92 


0.99 


1.00 


1.00 


0.98 


0.98 


0.96 






A 


0.96 


Var 


(3 0 ) 




















Var 


(3*) 


0.38 


0.73 


0.92 


0.99 


1.00 


0.99 


0.99 


0.98 


0.98 


0.98 




A 


Var 


(3 X ) 






















Var 


(3*) 


0.37 


0.72 


0.92 


0.99 


1.00 


0.99 


0.99 


0.98 


0.98 


0.98 


A 



Var (S 2 ) 



TABLE 4 

Probability of Type I Error When Nominal Significance 
Level Is 0.05 Under Serial Correlation 





-0.4 


-0.2 


0.0 


P 

0.2 


0.4 


0.6 


0.8 


Example 1 


.0026 


.0164 


.05 


.11 


.20 


- 


- 


Example 2 




.05 


.14 


.38 


.70 


.92 





Computations of the probability of a Type I error in two different 
situations illustrate how the use of standard hypothesis tests derived 
under ( 1 ) and ( 2 ) can be extremely misleading under the autoregressive 
error model described by ( 8 ) and (9). 

In Table 4, example 1 is for a two-sided t-test of 3^=0 in 
the model: y^ = $q + 3^t + e^ , t = 1, p and p large, Elashoff 

(1968). Example 2 is for the F-test of 8 q = 3^ = $2 ~ ^3 = ^4 = ® in 

the model y fc = 3 Q + 3 ^ + + 8 -jt 3 + 8 ^ + e t , t = .... 1 , 

Hoel (1964). Both are examined under the autoregressive error model 



e t * pe t -i + u t 



where Efe^) = 0 and 



l - o 



p 

1 



p 

p 



p-1 p-2 




It is apparent that the probability of a Type I error is strongly 
affected by p . When positive serial correlation exists, the null 
hypothesis is likely to be rejected far too frequently. 

If the ordinary least squares estimators are unbiased and relatively 
efficient even when serial correlation exists, why are hypothesis tests 
so misleading? When serial correlation exists, the assumptions of inde- 
pendence of errors which justify the derivation of t- and F- statistics 



no longer hold (Elashoff, 1968). In particular, the formula (5) for 

a 2 

the variances of estimators no longer holds, and the usual estimator 0 g 
2 

of the a is a serious underestimate when serial correlation exists, 
e 

The general formula for the variance of estimates under assumptions 
of independence is 

(15) V (3) = (X'X)" 1 . 

a, e 

When dependence exists the appropriate formula is 

(16) V(6) = (X'X) -1 X'EX (X’X)" 1 . 

'V 

For example, the standard test of Hq : 3^ * 0 is based on using 
the formula 

- oj 

(17) Var (3.) = ZTJ 

1 z(t-t r 

t 

/N 

for the variance of 3^ in the formula for the t-statis'cic. However, if 
the errors follow the autoregressive model given by (8) and (9) the cor- 
rect formula for the variance for p large is approximately 

a 2 

(18) Var (3. ) — =■ (1 + ) . 

1 E(t-t)^ 1_p 

t 



The difference between these formulae can be quite substantial; their 
(see Table 5). 



ratio is 



TABLE 5 



Ratio of Standard Formula for Var to Correct Formula for 



A 

Var 3^ Under the Autoregressive Error Model 









P 










-0.6 


-0.4 -0.2 


0 


0.2 


0.4 


0.6 


ks. 

i+p 


4.0 


2.33 1.5 


1 


0.75 


0.63 


0.57 



~2 

The usual estimator of the error variance, obtained by deter- 

mining the mean square of the residuals from regression, is seriously 
biased downward. Observation of the graphs in Figure 3 gives some in- 
sight into this problem: the least squares lines actually "fit" better 

than they should, in the sense of reducing the squares of the deviations 
from the regression line. Hence the residual mean square is much less 
than the error variance. The average extent of bias under particular 
polynomial regression models can be determined algebraically. The prob- 
lem has been investigated by Watson (1955) and Watson and Hannan (1956) 
at a general theoretical level. 

Econometric Studies of the Effects of Serial Correlation 

In exploring the literature dealing with serial correlation and its 
effects, much of it will be found to be based on a model appropriate to 
economic studies (e.g., Cochrane & Orcutt, 1949; Johnston, 1963; Rao & 
Griliches, 1969). The econometric model is sufficiently different from the 
polynomial regression model with autoregressive error that the conclusions 
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based on it are either not directly pertinent to the polynomial model, 
or need to be interpreted with caution. However, the econometric model 
has received considerable study and sheds light on the effects of serial 
correlation. 

Econometric models are of the form 
(19) y t = e 0 + * lt + e 2 * 2t + ••• + e t 

where 3 q is frequently taken to be 0 . (In an econometric context, the 

may be such variables as annual prices of some commodities.) The 
major difference from the polynomial regression model considered in this 
paper is the use of random variables x.^ measured at each time point 
rather than powers of t to predict y . More sophisticated treatment 
of the "time" measure or the use of other indicators of change in con- 
ditions could lead to the application of econometric-type models. T. W. 
Anderson (1963) has pointed out that econometricians generally consider 
the inclusion of time variables to be substitutes for other unknown 
variables whose values are related to the time measure. 

In particular, Rao and Griliches (1969) study a simple econometric 
model : 



( 20 ) 



3 x fc + 



A x 



t-1 



P e 



t-1 



+ v 
+ u 



t 

t 



O 
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E(v t ) = E(u t ) = E(v t u t ) 

" E ( u t = E(v t v t _ 1 ) = 0 

E <\> - a v ’ E < u t> = a u ’ 

|X| <1, | p | < 1 . 



In contrast, the linear regression model with autoregressive error term is 



3 , 



Sit 



+ e. 



( 21 ) 



e t ' pe t-l + u t 



E(u t ) - E(u t u t-1 ) = 0 

E(u t ) = a u » IpI < 1 • 



Examining both models, it will be noted that in model (20) the constant 
term Sq has been dropped and a random variable x^ with an autoregressive 
probability structure has been substituted for t . A comparison of t 
written as t * (t-1) + 1 with x fc = Ax t ^ 4- v suggests that results 

for the Rao and Griliches model for \ close to 1 should be most 
similar to results for the linear regression model. 

For model (21), efficiency computations may be made directly; as 
seen in Table 3, the ordinary least squares estimators are generally effi- 
cient. For econometric models similar to (20), however, efficiency com- 
putations based on sampling experiments (necessary because x t is a random 
variable) indicate that ordinary least squares estimators may not be 
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satisfactory (see Cochrane & Orcutt , 1949, and Rao & Griliches, 1969). 
Rao and Griliches (1969) indicate that for X close to 1 ordinary 
least squares estimators are relatively efficient in the econometric 
model as one would expect. 



Tests for Serial Correlation 



The existence of serial correlation may be investigated in a 
variety of ways. Figure 3 indicates a common feature of serially cor- 
related data in the tendency for runs of positive and negative residuals 
from the fitted curve. An obvious initial step is to graph the data 
and to consider the patterns in the residuals. Mandel (1957, p. 562) 
applies a test for the cumulative error model based on the number of 
times the residuals change their sign. 

Several tests for the existence of serial correlation of the auto- 
regressive kind have been proposed. Probably the most easily applied is 
that based on the Durbin-Watson d-statistic, defined as 



( 22 ) 



d = 




t 




2 t-l> 



2 



P 

Z 

= 1 



z 



2 

t 



where the z fc are the residuals from the fitted regression line. 

Durbin and Watson (1951) provide tables of lower and upper bounds d^ 

a 

and d for values of p from 15 to 100 and of k (degree) from 
u r 

a 

1 to 5 , for single tail significance level a = .05 , .025 , and 
.001. The null hypothesis is p = 0 in the model e t = pe^ + u t . 
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Against the alternative hypothesis p > 0 , the null hypothesis is 

rejected if d < d„ , not rejected if d > d , and the test is in- 
x# u 

a a 

conclusive otherwise. Against the alternative hypothesis p < 0 , the 
null hypothesis is rejected if A-d < d„ , not rejected if A-d < d , 

.JG U 

a a 

and the test is inconclusive otherwise. The alternative hypothesis 

p t 0 may be tested by a combination of the above tests at the significance 
ot 

level j . For example, for tests of the hypothesis p > 0 for the data 
of Figure 3, the d value of 1.98 for the model with p = 0.2 is not 
significant at the .05 level, while the d value of 0.90 for the 
model with p - 0.8 is significant at the .01 level. Durbin (1970) 
provides an exact test when this bounds test is inconclusive. For a poly- 
nomial regression on t up to degree 5 Theil and Nagar (1961) provide 
approximate 1% and 5% significance points. 

Durbin (1969) has also developed a graphical method for a more 
general test of departures from serial independence. 

Conclusions 

Studies of regression problems with serial dependence in the 
residuals have been concerned primarily with evaluating the effects of 
violating the ordinary least squares assumptions. As pointed out here, 
ordinary least squares estimators of the coefficients in a polynomial 
regression such as 

y t = b 0 + e x t + e 2 t 2 + e t 

2 

are unbiased and may still be efficient even when i* 0 I . However, 
inferences about > 3^ > and based on ordinary least squares 

procedures may be quite misleading. 
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Little consideration has been given, in the case of individual series, 
to the development of alternative procedures or to the problem of providing 
a more reliable basis for statements of inference, except for the econo- 
metric model. Although there are no clear-cut procedures to follow, if 
it has been determined that there is serial dependence of an extent to 
make ordinary least squares inappropriate, several alternative possibilities 
have been suggested. 

The initial problem is to settle on an appropriate model for the form 
of dependence. If the choice is restricted to either a cumulative or a 
first-order autoregressive model, investigation of the residuals from a 
fitted ordinary least squares regression may provide sufficient information 
to distinguish the more appropriate model. Selection of a first-order 
autoregressive model against a higher-order model may be based on con- 
sideration of the sample serial correlation coefficients of lag 1 and 
higher (Holtzman, 1963). There are some potential problems in 
arriving at an appropriate model for the form of serial dependence, due 
to the bias toward randomness of the residuals mentioned in the section 
concerning the effects of serial correlation on conventional least 
squares procedures. 

Procedures for analyzing data for which a cumulative error model i.s 
appropriate are detailed in the articles of Mandel (1959) and Jaech 
(1964). In a first-order autoregressive model, C. R. Rao (1967) presents 
a procedure for estimating the coefficients 3 in a polynomial regression. 

'X, 

Closely related to this are the approaches investigated by P. Rao and 
Griliches (1969) . 
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C. R. Rao (1967), noting that in the first-order autoregressive 
error model one can write = e - P e t-1 * expresses the polynomial 
model in the form 

- y t - - 3g(l-p) “ 3-^ (t-p ( t-1) ) - ... - 3 k (t k -p(t-l) k ) . 

2 

Then p and the are estimated simultaneously by minimizing Zu fc . 

This approach looks interesting but no specific information i6 available 
on the characteristics of the estimators obtained. 

Rao and Griliches have suggested similar approaches for the econo- 

2 

metric model (20) although minimizing Eu t presents problems in the 
general situation of random x t since the relationship is nonlinear in 
0 and p . Their investigations do suggest that C. R. Rao’s approach 
may be useful for the polynomial regression model. 

While consideration of the analysis of individual growth curves 
provides some insight into the problems presented by the occurrence of 
serial dependence, educational data is most often available for more than 
one individual. In this situation, the problem of postulating a model and 
estimating several parameters, all on a small piece of data, are avoided if 
one is willing to make certain assumptions about similarities of models 
for individuals. Gaito and Wiley (1963) and Bock (1963) provide an intro- 
duction to the literature in this area. 
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APPENDIX 



Orthogonal Polynomial Coefficients 

This paper discusses the effects of serial dependence on analyses 
based on polynomial regression coefficients. Generally, orthogonal 
polynomial coefficients have been preferred in the psychological and 
educational literature dealing with repeated measurements data as being 
more descriptive of the data (see, e.g., Gaito & Wiley, 1963; Bock, 
1963). Some of the relationships between the two approaches will be 
briefly indicated here. 

In equation (3) the general form of the polynomial regression model 
is expressed as 



(A. 1) 
where 



y = X 3 + e 



1 
1 

X = 

a. 



1 

2 



1 

4 

9 

P 



,k-l 



,k-l 



’k-1 

P 



for equally spaced time points. 

Analysis based on orthogonal polynomial coefficients proceeds in 
the same way as for the polynomial regression, but the matrix X is 

'X/ 

replaced by a matrix of orthogonal polynomials, denoted for convenience 



as P . The vector of coefficients is denoted as y 
may be expressed as 
(A, 2) y ■ P y + e . 

r\j % *\j Oj 



Hence the model (3) 
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The matrix P is obtained by a process of "orthogonalizing" the 

'Xi 

matrix X . Standard procedures exist for this, but the elements of 

% 

are readily available in tabulated form when the time points are equally 
spaced (Winer, 1962; Fisher & Yates, 1957). For unequally spaced 

time points, see Gaito (1965). 

As an example, the matrix P corresponding to a quadratic poly- 

% 

nomial based on 5 time points is 



P = 



J. 


-2 


2 


/5 


/ lo 




_1 


-1 


-1 


t/5 


/To 


/l4 


__1 


0 


-2 




yj 


v^Ta 


_1 


l 


-l 


S5 


/l 0 




_1 


2 


2 




/io 


^T4 



P has the characteristic that the columns are orthogonal to each 

'U 

other; i.e., the sum of the products of the elements is zero. This leads 
to the result: 



P'P = 

'U 'U 



1 

0 

0 



0 

1 

0 



0 

0 

1 



= I . 
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Consequently, the formulae for y and Var y are considerably 
simpler than for 8 and Var 8 . The estimators of y from least 
squares theory are: 

(A. 3) y = P'y 

with variance-covariance matrix 

(A. 4) V(y) * I . 

'Xi 

This latter formula indicates the algebraic independence among the 
estimates of the coefficients when the errors are independent. Since the 
variance-covariance matrix is diagonal, the individual coefficients are 
independent; e.g., in the example shown in Table 2, y^ , y^ , y^ are 
algebraically independent. This is not true for the polynomial regression 
coefficients, for which there exists an algebraic dependence between 
3 q » 3^ » 32 » etc. 

When serial dependence exists between observations, formula (14) 
for the variance of the ordinary least squares estimators becomes 

Var (y) = (P'P)' 1 P* E P (P'P)" 1 

*\/ *" V / *\j *"0 *"0 

= P' E P since P'P = I . 

2 

It can be seen from this formula that when E ^ 0 I the o.l.s. 
estimates of the orthogonal polynomial coefficients no longer have the 
advantage of being independent. Relative efficiency figures derived from 
these formulae for the orthogonal polynomial coefficients are quite simi- 
lar to those for the ordinary polynomial coefficients, and hypothesis 
testing behavior should follow the same pattern. 
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